{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a6457769-dfaf-4241-ab32-dcf901dde902",
   "metadata": {
    "tags": []
   },
   "source": [
    "## GPT Keyword Table Index Comparisons\n",
    "\n",
    "Comparing SimpleKeywordTableIndex, RAKEKeywordTableIndex, KeywordTableIndex.\n",
    "\n",
    "- SimpleKeywordTableIndex - uses simple regex to extract keywords.\n",
    "- RAKEKeywordTableIndex - uses RAKE to extract keywords.\n",
    "- KeywordTableIndex - uses GPT to extract keywords."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "075080e5-c255-4a5c-9330-9da11532e1c8",
   "metadata": {
    "tags": []
   },
   "source": [
    "#### SimpleKeywordTableIndex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b367b7ef-6a7d-4aee-b174-dba6ec4d2e21",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[nltk_data] Downloading package stopwords to /home/jerry/nltk_data...\n",
      "[nltk_data]   Package stopwords is already up-to-date!\n"
     ]
    }
   ],
   "source": [
    "from llama_index import SimpleKeywordTableIndex, SimpleDirectoryReader\n",
    "from IPython.display import Markdown, display"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1f8248fa-e0bd-494a-ad68-8192ccc87696",
   "metadata": {},
   "outputs": [],
   "source": [
    "# build keyword index\n",
    "documents = SimpleDirectoryReader(\"data\").load_data()\n",
    "index = SimpleKeywordTableIndex(documents)\n",
    "query_engine = index.as_query_engine()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53833655-0296-4bcb-b501-259b043d68b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = query_engine.query(\"What did the author do after his time at YC?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "62bcca18-b644-4393-ad29-6c5f0424fb22",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<b>\n",
       "\n",
       "The author went on to write essays and work on other projects, including a new version of the Arc programming language and Hacker News. He also started painting, but stopped after a few months. In 2015, he started working on a new Lisp programming language, which he finished in 2019. The author then moved to England in 2016 with his family and continued writing essays. In 2019, he finished Bel and wrote a bunch of essays on various topics.\n",
       "\n",
       "The author also worked on building online stores in 1995 after finishing ANSI Common Lisp. He ran the software on servers and let users control it by clicking on links, which was a new concept at the time. In 1996, he co-founded Viaweb with Robert Morris, which was later acquired by Yahoo in 1998. After leaving Yahoo, the author moved back to New York and started painting again. In 2000, he had the idea for a web application that would let people edit code on a server and host the resulting applications, which later became known as \"Reddit\".</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d24f9a20-48a6-4131-91b9-b01448c6ecb5",
   "metadata": {},
   "source": [
    "#### RAKEKeywordTableIndex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "c4d3f293-e608-4b90-86aa-9bce666dbcd5",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[nltk_data] Downloading package stopwords to /home/jerry/nltk_data...\n",
      "[nltk_data]   Package stopwords is already up-to-date!\n"
     ]
    }
   ],
   "source": [
    "from llama_index import RAKEKeywordTableIndex, SimpleDirectoryReader\n",
    "from IPython.display import Markdown, display"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66b1da3b-8231-4da9-8026-4f95481c79df",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# build keyword index\n",
    "documents = SimpleDirectoryReader(\"data\").load_data()\n",
    "index = RAKEKeywordTableIndex(documents)\n",
    "query_engine = index.as_query_engine()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "f13e5543-c6cb-4651-986c-ecde0f4bf789",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> Starting query: What did the author do after his time at YC?\n",
      "Extracted keywords: []\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"What did the author do after his time at YC?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "5ae01ac3-55fa-43a3-9b24-f733072d5f8d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<b>Empty response</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59cee6cf-92df-40d8-8dad-a40b792de96f",
   "metadata": {},
   "source": [
    "#### KeywordTableIndex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "78d59ef6-70b0-47bb-818d-7237a3b7de75",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import KeywordTableIndex, SimpleDirectoryReader\n",
    "from IPython.display import Markdown, display"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a3f1c67-6d73-4f37-afcf-9e637002fcff",
   "metadata": {},
   "outputs": [],
   "source": [
    "# build keyword index\n",
    "documents = SimpleDirectoryReader(\"data\").load_data()\n",
    "index = KeywordTableIndex.from_documents(documents)\n",
    "query_engine = index.as_query_engine()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69d4f686-6825-49cf-a113-d2fdd484de77",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = query_engine.query(\"What did the author do after his time at Y Combinator?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "a483514d-4ab5-489d-8b99-7250df491ce3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<b>\n",
       "\n",
       "After a few years, the author decided to step away from Y Combinator to focus on other projects, such as painting and writing essays. In 2013, he handed over control of Y Combinator to Sam Altman. The author's mother passed away in 2014, and after taking some time to grieve, he returned to writing essays and working on Lisp. He continued working on Lisp until 2019, when he finally completed the project.\n",
       "\n",
       "In 2015, the author decided to move to England with his family. They originally intended to only stay for a year, but ended up liking it so much that they remained there. The author wrote Bel while living in England. In 2019, he finally finished the project. After completing Bel, the author wrote a number of essays on various topics. He continued writing essays through 2020, but also started thinking about other things he could work on.</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "112e21ee-587c-4d8b-871e-cb99b94e3778",
   "metadata": {},
   "source": [
    "## GPT Keyword Table Query Comparisons\n",
    "Compare retriever_mode={\"default\", \"simple\", \"rake\"}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3029961a-ec22-42a1-90d6-f5892eb81e34",
   "metadata": {},
   "outputs": [],
   "source": [
    "# build table with default KeywordTableIndex\n",
    "from llama_index import KeywordTableIndex, SimpleDirectoryReader\n",
    "from IPython.display import Markdown, display\n",
    "\n",
    "documents = SimpleDirectoryReader(\"data\").load_data()\n",
    "index = KeywordTableIndex.from_documents(documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "d75b31da-4788-4295-8642-07ac5c4f11a5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> Starting query: What did the author do after his time at Y Combinator?\n",
      "Extracted keywords: ['y combinator', 'combinator']\n",
      "> Querying with idx: 235042210695008001: of excluding them, because there were so many s...\n",
      "> Querying with idx: 7029274505691774319: it was like living in another country, and sinc...\n",
      "> Querying with idx: 1773317813360405038: browser, and then host the resulting applicatio...\n",
      "> Querying with idx: 3866067077574405334: person, and from those we picked 8 to fund. The...\n"
     ]
    },
    {
     "data": {
      "text/markdown": [
       "<b>\n",
       "\n",
       "The author went on to write a book about his experiences at Y Combinator, and then moved to England. He started writing essays again and also began working on a new Lisp programming language. He also wrote an essay about how he chooses what to work on.</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# default\n",
    "query_engine = index.as_query_engine(retriever_mode=\"default\")\n",
    "response = query_engine.query(\"What did the author do after his time at Y Combinator?\")\n",
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "07b713f4-adfc-46f7-a795-5b333e33d49d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> Starting query: What did the author do after his time at Y Combinator?\n",
      "Extracted keywords: ['combinator']\n",
      "> Querying with idx: 235042210695008001: of excluding them, because there were so many s...\n",
      "> Querying with idx: 7029274505691774319: it was like living in another country, and sinc...\n",
      "> Querying with idx: 1773317813360405038: browser, and then host the resulting applicatio...\n",
      "> Querying with idx: 3866067077574405334: person, and from those we picked 8 to fund. The...\n"
     ]
    },
    {
     "data": {
      "text/markdown": [
       "<b>\n",
       "\n",
       "The author went on to write a book about his experiences at Y Combinator, and then moved to England. He started writing essays again and also began working on a new Lisp programming language. He also wrote an essay about how he chooses what to work on.</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# simple\n",
    "query_engine = index.as_query_engine(retriever_mode=\"simple\")\n",
    "response = query_engine.query(\"What did the author do after his time at Y Combinator?\")\n",
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "d2e19ad9-3190-45e5-a28d-235c28296d70",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> Starting query: What did the author do after his time at Y Combinator?\n",
      "Extracted keywords: ['combinator']\n",
      "> Querying with idx: 235042210695008001: of excluding them, because there were so many s...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[nltk_data] Downloading package punkt to /home/jerry/nltk_data...\n",
      "[nltk_data]   Package punkt is already up-to-date!\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> Querying with idx: 7029274505691774319: it was like living in another country, and sinc...\n",
      "> Querying with idx: 1773317813360405038: browser, and then host the resulting applicatio...\n",
      "> Querying with idx: 3866067077574405334: person, and from those we picked 8 to fund. The...\n"
     ]
    },
    {
     "data": {
      "text/markdown": [
       "<b>\n",
       "\n",
       "The author went on to write a book about his experiences at Y Combinator, and then moved to England. He started writing essays again and also began working on a new Lisp programming language. He also wrote an essay about how he chooses what to work on.</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# rake\n",
    "query_engine = index.as_query_engine(retriever_mode=\"rake\")\n",
    "response = query_engine.query(\"What did the author do after his time at Y Combinator?\")\n",
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "myvenv",
   "language": "python",
   "name": "myvenv"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
